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ABSTRACT 

Current models of galaxy formation applied to understanding the large-scale struc- 
ture of the universe have two parts. The first is an accurate solution of the equations 
of motion for the dark matter due to gravitational clustering. The second consists of 
making physically reasonable approximations to the behavior of baryons inside dark 
matter halos. The first uses large, computationally intensive, n-body simulations. We 
argue that because the second step is, at least at present, uncertain, it is possible to 
obtain similar galaxy distributions without solving the first step exactly. 

We describe an algorithm which is several orders of magnitude faster than n- 
body simulations, but which is, nevertheless, rather accurate. The algorithm combines 
perturbation theory with virialized halo models of the nonlinear density and velocity 
fields. For two- and three-point statistics the resulting fields are exact on large scales, 
and rather accurate well into the nonlinear regime, particularly for two-point statistics 
in real and redshift space. We then show how to use this algorithm to generate mock 
galaxy distributions from halo occupation numbers. As a first application, we show 
that it provides a good description of the clustering of galaxies in the PSCz survey. 

We also discuss applications to the estimation of non-Gaussian contributions to 
error bars and covariance matrix of the power spectrum, in real and redshift space, 
for galaxies and dark matter. The results for the latter show good agreement with 
simulations, supporting the use of our method to constrain cosmological parameters 
from upcoming galaxy surveys. 
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1 INTRODUCTION 

Upcoming large galaxy surveys will provide a major ad- 
vance in our understanding of the large-scale structure of 
the universe, helping to determine cosmological parameters, 
constrain models of galaxy formation and the properties 
of primordial fluctuations that gave rise to galaxies. How- 
ever, extracting information about cosmological parameters 
from the galaxy distribution requires an accurate modeling 
of non-linear effects such as gravitational evolution, redshift 
distortions, and galaxy biasing (i.e., the relationship between 
the galaxy and the underlying dark matter distribution). 

The physics of galaxy formation is not yet fully un- 
derstood. Numerical simulations which include the effects 
of both gravitational instability as well as gas hydrodynam- 
ics in cosmological volumes are only now becoming available 
(e.g. Cen & Ostriker 2000; Pearce et al. 2001). But the role of 
feedback from star formation, and how to incorporate it into 
simulations, remains uncertain. As a result, a complemen- 



tary approach to generating realistic galaxy distributions 
has been to use semianalytic models (see, e.g., Kauffmann 
et al. 1999; Somerville & Primack 1999; Cole et al. 2000). 

These models build on the work of White & Rees (1978) 
and White & Frenk (1991), in which galaxy formation is 
treated as a two-stage process: dark matter haloes virialize, 
and gas cools and forms stars within these virialized ha- 
los. The first step is solved numerically: n-body simulations 
of nonlinear gravitational clustering are used to follow the 
formation of the dark matter halos. The second step uses 
a number of reasonable prescriptions for approximating the 
complicated nonlinear physics of gas cooling in gravitational 
potential wells to incorporate star and galaxy formation into 
the simulations which otherwise only describe the effects of 
gravitational clustering. The predictions of the semi-analytic 
models, while similar to those from smoothed particle simu- 
lations which solve the hydrodynamic equations for the gas 
numerically, can differ by as much as fifty percent in the two- 
point correlation function (e.g. Benson et al. 2001). There- 
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fore, it is this second step which is the more uncertain one, 
since we do not yet understand galaxy formation from first 
principles. On the other hand, it is the first step which is 
the most time consuming. 

In this paper we present a method which generates re- 
alistic galaxy distributions in a very small fraction of the 
time it takes for methods that require n-body simulations of 
gravitational clustering. By realistic distributions, we mean 
specifically that the point distributions our method gener- 
ates can reproduce the observed galaxy clustering statistics; 
in particular, the distribution of galaxy counts-in-cells, the 
power spectrum and the bispectrum. The philosophy of our 
approach is that since the step which involves generating 
galaxies from knowledge of the dark matter distribution is 
necessarily uncertain, one does not need to start with a fully 
correct dark matter distribution to generate galaxy correla- 
tions to the extent allowed by the present understanding of 
galaxy formation. By suitably approximating the non-linear 
structures which form in the gravitational clustering simu- 
lations, one can hope to still obtain a reasonably accurate 
galaxy distribution by slightly altering the mapping from 
dark matter to galaxies within the uncertainties. 

To obtain an approximation to the fully nonlinear dark 
matter distribution we first generate the large-scale dark 
matter distribution using second-order Lagrangian pertur- 
bation theory (2LPT). This reproduces the correct two and 
three-point statistics at large scales, and approximates the 
four-point and higher-order statistics very well (Moutarde 
et al. 1991; Buchert et al. 1994; Bouchet et al. 1995; Scoc- 
cimarro 2000). Note that use of 2LPT, rather than linear 
theory or the Zel'dovich approximation, is essential to incor- 
porate accurately the large-scale departures from Gaussian 
initial conditions. 

The 2LPT correlations are incorrect on small scales, 
where perturbation theory breaks down. We build up more 
accurate small-scale correlations by using the amplitude of 
the 2LPT density field to determine the masses and positions 
of virialized halo centers, and we then distribute particles 
around the halo centers with realistic density profiles. We 
use the 2LPT code described by Scoccimarro (1998,2000) 
to set up the perturbation theory density and velocity 
fields, and the merger history algorithm of Sheth & Lem- 
son (1999b) to partition the 2LPT density field into haloes. 
Finally, a galaxy distribution can be generated by specify- 
ing how many galaxies populate haloes of a given mass (e.g. 
Mo, Jing & Borner 1997; Jing, Mo & Borner 1998). In this 
respect, our PTHalos algorithm treats the difference between 
the clustering statistics of the dark matter and galaxy dis- 
tributions in much the same way that recent analytic mod- 
els (Peacock & Smith 2000; Seljak 2000; Scoccimarro et al. 
2001) do. 

In this paper, we do not attempt to match other galaxy 
properties (e.g. the luminosity function) than clustering 
statistics. Statistics like the luminosity function can be es- 
timated rather easily by simply replacing the dark matter 
distribution of an n-body simulation with the PTHalos dis- 
tribution, and then running the usual semianalytic galaxy 
formation codes on top. Our motivation was mainly to de- 
velop a fast algorithm which can be used to generate real- 
istic non-Gaussian galaxy distributions, and thus provide a 
method to quickly explore parameter space for constraining 
cosmological parameters and galaxy formation models from 



galaxy surveys. In addition, the speed of our code makes fea- 
sible to construct a large number of mock galaxy catalogs 
in a reasonable CPU time from which reliable estimation of 
errors and covariance matrices can be derived. Application 
along these lines to imaging data in the SDSS survey will 
be considered elsewhere (Connolly et al. 2001; Dodelson et 
al. 2001; Scranton et al. 2001; Szalay et al. 2001; Tegmark 
et al. 2001). 

This paper is organized as follows. In Section we re- 
view 2LPT. In Section ^ we describe how we parametrize 
haloes and in Sectionlj how we place them in the 2LPT den- 
sity field. In Section |5j we compare the clustering statistics 
of the dark matter distribution of PTHalos to n-body sim- 
ulations. In Section ^ we describe how we map from dark 
matter to galaxies. We show that our method provides a 
sensible match to to the galaxies in the PSCz survey. We 
conclude in Section 0. 



2 SECOND-ORDER LAGRANGIAN PT 

In Lagrangian PT, the dynamics is described by the dis- 
placement field ^(q) which maps the initial particle posi- 
tions q into the final Eulerian particle positions x, 

x = q + *(q). (1) 

The equation of motion for particle trajectories x(t) is 



d 2 x 



c/x 



z3 + «Wr = -v* 



(2) 



where $ denotes the gravitational potential, and V the gra- 
dient operator in Eulerian coordinates x. Taking the diver- 
gence of this equation we obtain 



J(q,T) V 



rf 2 x „ ,. „ dx 



= -n m n 2 (j-i) 



(3) 



where we have used Poisson equation together with the fact 
that 1 + 5(x) = J -1 , and the Jacobian J(q, r) is the deter- 
minant 



J(q, t) ee Dot 



(4) 



where vJ^j ee d^i/dqj. Equation Q can be fully rewrit- 
ten in terms of Lagrangian coordinates by using that V; 
•) + Voj > where V 5 ee <9/<9q denotes the gradient 

operator in Lagrangian coordinates. The resulting non-linear 
equation for ^(q) is then solved perturbatively, expanding 
about its linear solution, the Zel'dovich (1970) approxima- 
tion 

V,-* (1) = - J Di(r)«5(q). (5) 

Here <5(q) denotes the (Gaussian) density field imposed by 
the initial conditions and -Di(t) is the linear growth factor. 
The solution to second order describes the correction to the 
ZA displacement due to gravitational tidal effects and reads 



v,.«»=iAi(r)x;(*£?*j3-*a ) *^ 



(6) 



(e.g., Bouchet et al. 1995) where ^(t) denotes the second- 
order growth factor, which for flat models with non-zero 
cosmological constant A we have for 0.01 < fi m < 1 



D 2 (r) 



a; 



-1/143 



(7) 
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to better than 0.6% and 2.6%, respectively (Bouchet et al. 
1995). Since Lagrangian solutions up to second-order are 
curl-free, it is convenient to define Lagrangian potentials 
<A (1) and «!> (2) so that in 2LPT 



x(q) 



Z>i V 



qV {1) + D 2 V q 



(8) 



(9) 



and the velocity field then reads (t denotes cosmic time) 
v ee = -Di h H V q (1) + D 2 f 2 H V q <f> (2 \ 

where H is the Hubble constant, and the logarithmic deriva- 
tives of the growth factors = (d\nDi)/(d\na) can be ap- 
proximated for flat models with non-zero cosmological con- 
stant A and 0.01 < fi m < 1 



6/11 
m i 



(10) 

to better than 10% and 12%, respectively (Bouchet et al. 
1995). The accuracy of these two fits improves significantly 
for Sl m > 0.1, in the range relevant according to present 
observations. The time-independent potentials in Eqs. (^]) 
and (^) obey the following Poisson equations (Buchert et al. 
1994) 

V^ (1) (q) = 5(d), 



i 



(q) 



5>$(q) </>$(q)-(^(q)) 2 



(11) 

(12) 



Thus, 2LPT positions and velocities can all be deter- 
mined from the initial fluctuation field. In practice, we gen- 
erate 2LPT positions and velocities using the algorithm 
described in detail in Appendix D of Scoccimarro (1998), 
where the Poisson equations are solved by standard fast 
Fourier transform methods. As mentioned before, 2LPT 
recovers the exact two and three-point statistics at large 
scales, and approximates very well higher-order ones (see 
e.g. Fig. 15 in Bouchet et al. 1995, and Table 2 in Scocci- 
marro 2000). It is possible to improve on 2LPT by going to 
third-order in the displacement field (3LPT) , however it be- 
comes more costly due to the need of solving three additional 
Poisson equations (Buchert 1994). 3LPT reproduces exactly 
up to four-point statistics, and improves the behavior in the 
underdense regions (where 2LPT tends to overestimate the 
density, see Bouchet et al. 1995). 



3 FROM 2LPT TO VIRIALIZED HALOES 

The end result of 2LPT is a list of particle positions and 
velocities. These positions and velocities are not quite the 
same as those the particles would have had in a full n-body 
simulation which started from the same initial conditions. 
This section describes how to use our knowledge of fully 
nonlinear density and velocity fields to increase the agree- 
ment with simulations. 

The primary difference between the 2LPT density field 
and that from a full n-body simulation of nonlinear grav- 
itational clustering is that the 2LPT density field has no 
virialized objects. In a full simulation, however, most of the 
mass in a simulation box is partitioned into virialized haloes 
(e.g. Tormen 1998). For our purposes here, all virialized ob- 
jects have well defined edges, and the edge, called the virial 
radius, is defined so that all virialized haloes have the same 
spherically averaged density, whatever their mass: m oc r^. 



Therefore, in what follows, we will make the simplifying as- 
sumption that all virialized haloes are spherical. (In fact, 
haloes have a range of shapes; allowing for a distribution 
of shapes is a detail which does not change the logic of our 
method, so we will comment on it in more detail later.) 

We will approximate the effects of fully nonlinear clus- 
tering in two steps. First, we need a prescription for divid- 
ing up the 2LPT density field into a collection of virialized 
haloes. Once this has been done, we must decide how to dis- 
tribute the mass associated with each halo around the halo 
centre-of-mass. This second step is more straightforward, so 
we will describe it first. 



3.1 Virialized halo profiles 

As noted above, we will assume that all dark matter haloes 
are spherically symmetric. High resolution n-body simula- 
tions (Navarro, Frenk & White 1997) show that the spheri- 
cally averaged density run around the centre of a virialized 
halo containing mass m within the virial radius r v i r is well 
fit by 



p(r) _ A vir (z) c 3 /(c) 
p 3fi(z) x(l + x) 2 ' 



where x '■ 



"(m) ■ 



(13) 



where p(z) is the average density of the background universe 
at z, p(z) A v i r (z) /£l(z) is the average density within the 
virial radius (A v i r ~ 178 for all z in an Einstein de-Sitter uni- 
verse; it is sa 102 at z = for the ACDM model we present 
results for in this paper) , and /(c) = [In ( 1 + c) — c/(l + c)] . 
The density run is a broken power-law, with a shallower in- 
ner slope and a steeper outer slope. The exact shape of these 
two slopes is the subject of some debate. Although we will 
use the NFW form in what follows, it is trivial to modify 
our algorithm to generate profiles of the form given, e.g., by 
Hernquist (1990) or by Moore et al. (1999). The parameter c 
is often called the central concentration of the halo. As NFW 
noted, more massive haloes are less centrally concentrated. 
We will use the parametrization of this trend provided by 
Bullock et al. (2001): 



c(m) 



9 



(1 



■(— )' 

\m»o / 



(14) 



where m„o is the standard non-linear mass scale (defined in 
the next subsection). Numerical simulations show that not 
all haloes of mass m have the same density profile, there 
is considerable scatter. Fortunately, this scatter is well fit 
by using the same NFW shape for all haloes, but letting 
the concentration parameter have a Lognormal distribution 
with dispersion A(logc) « 0.2 (Jing 2000; Bullock et al. 
2001). We include this scatter in PTHalos. 

Equation (jl3|) says that if the centre of mass of an m- 
halo is at position x, then there will be Ndm m particles 
distributed around x according to equation ( p^ ) . This is eas- 
ily done. For example, if the halo profile were an isothermal 
sphere (p oc r~ 2 ), then particle positions around the halo 
centre (r, 8, (f>) could be got by drawing random numbers 
distributed uniformly between zero and r v i r for the radial 
distance r from the centre, uniform numbers between plus 
and minus one for cos 9, and uniform numbers between zero 
and two pi for </>. Generating an NFW profile is only slightly 
more complicated. 
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Figure 2. The same slice as previous figure for PTHalos. 



Because the density run depends on halo mass m, the 3.2 Halo masses and positions 
trick is to identify those positions in the 2LPT density field 

which should be identified with the centres-of-masses of m- Imagine comparing the 2LPT density and velocity fields 
haloes. The next subsection describes how to do this. with those from an n " bod y simulation which started from 

the same initial fluctuation field. One might imagine that the 
2LPT density and velocity fields contain information about 
where bound objects in the n-body simulation formed. For 
example, perhaps the densest 2LPT regions are those regions 
which, in the n-body simulation, collapsed to form bound 
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haloes. If so, then one might imagine running a friends-of- 
friends group finder on the 2LPT density field to identify 
these overdense regions. One could then use the number of 
particles associated with the friends-of-friends group as an 
estimate of the mass of the virialized halo which formed 
in the n-body simulation, and the position of the centre- 
of-mass of the group could be used as an estimate of the 
position of the corresponding halo. While such a procedure 
is possible in principle, running a group-finder can be quite 
time-consuming. This is the primary reason why we have 
adopted the approach we describe below. 

At any given time, which we will label by redshift z, 
virialized haloes have a wide range of masses. Let n(m\z) 
denote the comoving number density of haloes of mass m at 
z. The shape of this universal mass function distribution is 
well approximated by 



m 2 n(m\z) „, , dlnu 

- = v f\ v > Ti ' 

p amm 



(15) 



where p denotes the comoving density of the background, 
and 



vf(v) = 2A(l + (av 2 )- p ) ( 



2 \ 1 / 2 / 2 

av \ I av 

2iT J 6XP h — 



(16) 



Here v = S BC (z) /a(m, z) where S BC (z) is the collapse thresh- 
old given by the spherical collapse model (e.g. it is 1.68 in 
an Einstein-deSitter universe), a 2 (m,z) is the linear theory 
variance in the density field when smoothed on the comoving 
scale R = (3m/47rp) 1/3 at *, and A = 0.5,0.322, p = 0,0.3 
and a = 1,0.707 for the mass functions given by Press & 
Schechter (1974) and Sheth & Tormen (1999), respectively. 
(Sheth, Mo & Tormen 2001 show that these two cases may 
be related to models in which objects form from spherical or 
ellipsoidal collapses, respectively. Jenkins et al. (2001) com- 
pare these mass functions with n-body simulations.) The 
number of haloes falls exponentially at large masses. Let 
m, denote the mass at which this cut-off sets in. Then m* 
is defined by requiring er(m*, z) = 8 EC (z). The quantity m*o 
which should be used in equation (114) is got from setting 
<r(m*o,0) = 4o(0). 

The distribution of haloes in dense regions is different 
from that in underdense regions: n(m,8\z) ^ (1 + 5) n(m\z), 
where 5 denotes the overdensity of the region. For example, 
the ratio of massive to less massive haloes is larger in dense 
regions than in underdense ones. A simple model for this 
dependence is 



n(m, S\z) 



l + b(m\z)6 n(m\z), 



where 8 = M/pV — 1, and 



b(m\z) = 1 + 



iv 2 - 1 2p/S sc (z) 
5 sc (z) 1 + (av 2 )p 



(17) 



(18) 



(Mo & White 1996; Sheth & Tormen 1999). The actual de- 
tailed dependence of n(m, 8\z) on 5 can be computed follow- 
ing Lemson & Kauffmann (1999), Sheth & Lemson (1999a) 
and Sheth & Tormen (2001), so we will not repeat the anal- 
ysis here. 

If we place a grid on the 2LPT density field, then some 
of the cells will be denser than others. We must find an 
algorithm which ensures that the distribution of halo masses 



in the different density cells follows the correct n(m,5\z) 
relation. 

We can do this as follows. The actual nonlinear density 
in V at z is given by M/V = p(l + 5). Let So denote the 
value for the overdensity one would have predicted for such 
a region, had one used linear theory to make the predic- 
tion. Typically, we expect that if 5 > 0, then So < 5, and 
viceversa for underdense regions. A simple fitting formula to 
the relation between the nonlinear and the linear overdensi- 
ties, 5 and So, got from assuming that objects form from a 
spherical collapse in an Einstein-deSitter universe, has been 
provided by Mo & White (1996). We have checked that the 
following simple modification to their formula is accurate for 
all cosmologies of interest: 



So 



1.68647 



1.35 



1.12431 

(1 + 5)2/3 ~ (1 + 5)1/2 

S sc {z) 



+ 



0.78785 



(1 + 5)0-58661 

1.68647' (19) 
Now, what we are trying to do is to partition the mass 
M in V at z up into subregions, each of nonlinear density 
A v i r (,z) 3> 1, and hence each with predicted linear density 
S BC (z). 

This is extremely similar to what one does when one 
studies the merger histories of objects. Given the mass 
M and the linear theory density So associated with that 
mass, one studies the distribution of subclumps m,j of M 
at some earlier time when the critical density for collapse 
was 8 sc (z). A number of merger history tree codes which 
do this are available. The algorithm described by Sheth & 
Lemson (1999b) is simple and efficient, so we will use it to 
partition the 2LPT mass M up into virialized haloes. The 
only difference is that, whereas they were trying to generate 
the Press- Schechter mass function, we would like to gener- 
ate the distribution which fits the simulations better. This 
can be done by making the following simple change to their 
algorithm. 

Sheth & Lemson's algorithm loops over a range of 'time' 
steps, chooses a series of Gaussian random numbers gi at 
each time step, and then, for each i, makes use of g~ 2 . Their 
algorithm is fast because there are efficient ways to generate 
Gaussian variates. Therefore, if we modify the algorithm, we 
would like to do it in such a way that it still requires only 
Gaussian variates. For ACDM we have found that requiring 
the number of 'time' steps to equal ten, and then using a [3+ 
0.33/(1 + \gi\ 1 - 5 )]/\g l \ instead of o^f, where a = 0.707 is the 
same parameter as in equation (|16j), is all that is required 
to generate a mass function which is more like the n-body 
simulations (see Fig. ^|). 

Notice that because (1 + 5) < A v i r , equation ([l9]) guar- 
antees that So < S sc (z). Therefore, in essence, by using a 
merger tree code to perform the partition, we are making 
use of the following fact: a dense cell can be thought of as a 
region which will, at some point in the near future, become 
a virialized object. Therefore, when we view a dense cell at 
the present time, it is as though we are viewing a virialized 
object at a small 'lookback time' from the time it virialized. 
At small lookback times, most of the mass M of an object 
is likely to be partitioned up into just a few pieces which 
are each a substantial fraction of M, because there has not 
been enough time for M to have been split up into many 
smaller pieces. In hierarchical clustering models, the oppo- 
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Figure 3. The halo mass function at z = 1 (top) and 2 = (bot- 
tom). In each panel, squares show the output from PTHalo^ solid 
line shows the fitting function which describes the n-body simula- 
tions, and dashed line shows the Press— Schechter mass function. 
Stars and circles in the bottom panel show the mass function in 
the densest and the least dense three percent of the box at z = 0. 
These results correspond to flat ACDM realizations (as = 0.9, 
H m = 0.3 at z = 0) in a 100 Mpc/h box with mass resolution 
m min = lO 1O ' 5 M /h and R grid = 12.5 Mpc/h. 



site is true at large lookback times. This provides a simple 
reason why the ratio of massive to less massive haloes is 
larger in dense regions than in underdense ones. In partic- 
ular, this shows that when we partition the mass of a cell 
up as though we were partitioning the mass of a virialized 
halo up into subhaloes at high redshift, the only decision to 
be made is which redshift, which 'lookback time', to choose 
when running the merger tree algorithm. We have argued 
above that this choice depends on 5: the exact transforma- 
tion is given by equation (|l9|). 

To summarize, given the mass M and the 2LPT over- 
density S in a cell V, a merger tree algorithm is used to 
split M up into virialized haloes rrij. The next step is to 
decide where within V to place the halo centres. We do this 
by assuming that the most massive haloes within V occupy 
the densest subregion within V. Once this has been done, 
all that remains is to distribute particles around each halo 
centre as described in the previous subsection. 



3.3 Nonlinear velocities 

The final step is to account for the differences between the 
2LPT and n-body velocity fields. In the n-body simulations, 
it is a good approximation to assume that the motion of a 
particle can be written as the sum of two terms: 



V v i r + Vhalo, 



(20) 



motion of the halo as a whole (Sheth & Diaferio 2001). Fur- 
thermore, the virial motions within a halo are well approxi- 
mated by velocities which are independent Gaussians in each 
of the three cartesian components, with rms values which 
depend on halo mass: a 2 ir = (v 2 ir ) oc Gm/r,i, oc m 2//3 . In 
particular, we use (Bryan & Norman 1998; Sheth & Diaferio 
2001) 



a vir = 476/ vir (A ni £;(z) 2 ) 1/6 ( 



W 15 M Q /h 



1/3 



km/s, (21) 



where f vlI — 0.9 and A n i = I8iv 2 + 60x — 32x 2 with x = 

n{ z )-i, n( z ) = n (i+z) 3 /E(z) 2 , E(zf = n (i+z) :i +n A 

for a flat model with cosmological constant. 

We assume that virial motions are uncorrelated with 
the direction or amplitude of Vhaio- Therefore, if we substi- 
tute the 2LPT velocity vector at the position of the halo 
centre-of-mass for Vhaio, and then add an uncorrelated vec- 
tor of virial motions to each particle, and we do this for 
each of the particles in the halo, then the resulting velocity 
field should be quite similar to that in the n-body simula- 
tions. This is essentially a numerical implementation of the 
analytical model developed in Sheth et al (2001). 

This is the final step in converting the 2LPT density and 
velocity fields to fields which resemble the n-body simula- 
tions more closely. The next section describes the algorithm 
which implements all of this. 



4 PTHALOS 

4.1 The Algorithm 

Given a realization of the large-scale density field, PTHalos 
assigns halo centers to appropriate 2LPT particles, and then 
generates NFW density profiles around them. This is done 
as follows. 

• The desired mass resolution m m i n , the size of the sim- 
ulation volume, and the cosmological model set the abun- 



dance of halos of mass larger than m n 



N h 



Since halo 



where the first term represents the virial motion of the par- 
ticle within its parent halo, and the second term is the bulk 



centers are going to be identified as 2LPT particles, this sets 
the number of 2LPT particles to be used, A^lpt ~ A^aios, 
and thus their mass m,2LPT- 

• The 2LPT box is divided into cubic cells of size i? gr id, 
and the mass in 2LPT particles M; in each cell i is ob- 
tained. The choice of i? gr id is dictated by the competing 
requirements that it be small enough so that the mass dis- 
tribution is not rearranged at large scales (to preserve the 
correct correlations imposed by 2LPT) , and large enough so 
that, within the cell, there can be halos sufficiently massive. 
Unless otherwise noted, we use i? gr id = 15 Mpc/h. This cor- 
responds roughly to a spherical cell of 8 Mpc/h radius which 
is the natural non-linear scale. 

• Then, a merger history code is run on each cell i, which 
gives a partition of the cell mass Mi into halos of smaller 
mass rrij (Mi = mi + mi + ...) down to the mass resolution. 

• The 2LPT particles in a cell and the list of halos re- 
sulting from the merger history tree are matched so that 
most massive halos are centered about 2LPT particles in the 
densest regions. An exclusion volume of 2Mpc//i (lMpc//i 
at z = 1) radius is imposed about 2LPT centers in densi- 
ties larger than 5.6 (set by the turnaround density in the 
spherical collapse model) . Halos of mass rrij are constructed 
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about their 2LPT centers by sampling the NFW profile with 
the appropriate number of particles according to the mass 
resolution. 

• Velocities are assigned using v = V2lpt + v v i r , where 
V2lpt is the velocity of the 2LPT particle representing the 
halo center of mass, and v v i r is a virial velocity drawn at ran- 
dom from a Maxwellian distribution with one-dimensional 
dispersion o v \ T (mj) given by Eq. We require that the 
center of mass velocity of each halo equals that of the associ- 
ated V2lpt; this is always possible because A^lpt <; AWos- 
This maintains the correct 2LPT correlations between den- 
sity and velocity fields at large scales, which is important if 
we wish to have the correct large-scale redshift-space statis- 
tics. 

• A galaxy distribution can be obtained by specifying how 
many galaxies, N &a x(rrij) on average populate dark matter 
halos of mass rrij. Given this function, we use a binomial dis- 
tribution (as explained in detail in Scoccimarro et al. 2001) 
to sample the NFW profile with the resulting number of 
galaxies. Benson et al. (2000) describe one possible alterna- 
tive to the binomial. The binomial model we use has two 
free functions, the first and second moments of the num- 
ber of galaxies per halo of mass m. The first moment is 
parametrized by a broken power-law with a low-mass cut- 
off, as described in Section [| the second moment is related 
to the first taking into account that low-mass halos have a 
sub-Poisson dispersion (Kauffmann et al. 1999, Benson et 
al. 2000). 

Fig. [l] shows a slice of 150 Mpc//i a side and 6 Mpc//i 
thick from a 2LPT distribution corresponding to ACDM 
with f2 m = 0.3, Qa = 0.7, as = 0.90. Figure □ shows the 
same slice in PTHalos, after 2LPT high-density regions have 
been replaced by virialized halos. 



4.2 Memory Requirements and Speed 

The PTHalos code requires a memory of 50 bytes per par- 
ticle in the simulation box. Most importantly, a 300 3 parti- 
cle realization takes only about 5 minutes to generate on 
a single-CPU workstation, about 2 — 3 orders of magni- 
tude faster than n-body simulations. For particle numbers 
-Apar > 300 3 approximately equal time is spent generating 
the 2LPT density field, imposing the spatial exclusion of 
massive halos, generating profiles and velocities, and writ- 
ing the output files to disk. From tests we have performed up 
toArp ar = 400 3 ,the run time scales roughly as Np& r ln(A^ par ). 



5 PTHALOS VS. NUMERICAL SIMULATIONS 

We now turn to a quantitative comparison of clustering 
statistics for the dark matter obtained from PTHalos to n- 
body simulations. Application of PTHalos to the distribution 
of PSCz galaxies is discussed in Section ^. 



5.1 Mass Function 

A first test of our algorithm is to check that we obtain the 
correct mass function of dark matter halos. Fig. ^ shows 
the number density of haloes as a function of halo mass, 
for a simulation box of side 100 Mpc/h, mass resolution 
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Figure 4. The power spectrum as a function of scale for ACDM at 
2 = (top) and 2 = 1 (bottom). Symbols denote measurements in 
N-body simulations, solid lines correspond to PTHalos (averaged 
over 30 realizations), and dashed lines show linear perturbation 
theory. 
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Figure 5. The power spectrum in rcdshift-space at 2 = and 
2 = 1. The top panel shows the power spectrum monopole, the 
bottom panel corresponds to the quadrupole to monopole ratio. 
Line styles are as in Fig Q The dotted line in the bottom panel 
shows the fitting formula by Hatton & Cole (1999). 
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Figure 6. The skewness (53) and kurtosis (S4) as a function of 
smoothing scale R in real (top) and redshift (bottom) space. Cir- 
cles denote measurements in ten PTHalos realizations, and squares 
correspond to the average over four n-body simulations. 



m min = 10 w - 5 M Q /h and i? grid = 12.5 Mpc/h. The sym- 
bols show the universal mass function one gets after using 
the merger tree algorithm to partition the 2LPT mass into 
haloes, and the solid curves show the fitting formula which 
describes the universal mass function in n-body simulations 
(equations [l5| and ^). The symbols match the curves quite 
well, suggesting that our decision to use a merger tree algo- 
rithm was quite successful. The stars and circles in the z = 
panel show the mass function in the densest and the least 
dense three percent of the box. This shows that the densest 
cells contain the most massive haloes, as expected. 



5.2 Power Spectrum in Real and Redshift Space 

Figure ^| shows the power spectrum for the ACDM model 
at z = (ff 8 = 0.90, fl m = 0.3, fi A = 0.7) and 2 = 1 
(as = 0.55). The symbols in this and next figure denote 
the measurements in n-body simulations taken from Scocci- 
marro, Couchman & Frieman (1999a). The simulation was 
run by the Virgo Consortium; it corresponds to a single real- 
ization with 256 3 particles in a 240 Mpc/h box. The dashed 
lines show the linear power spectrum, and the solid lines cor- 
respond to the measured power spectrum averaged over 30 
PTHalos realizations. Figure |E| shows similar measurements 
for the power spectrum in redshift-space. The top panel cor- 
responds to the monopole at z — (top) and z = 1 (bottom) 
and the bottom panel shows the quadrupole to monopole ra- 
tio at z — (bottom) and z = 1 (top). Error bars for the 
simulations correspond to the dispersion among four differ- 
ent lines of sight for the redshift-space mapping, done in the 
plane-parallel approximation, as assumed throughout this 
paper. 



We see that the agreement between PTHalos and numer- 
ical simulations is very good, especially considering the sim- 
plicity of our algorithm. Similar results hold for the SCDM 
model. In redshift space, the power spectrum monopole is 
somewhat overestimated (for z = 0) at small scales, sug- 
gesting perhaps our model for virial velocities is oversimpli- 
fied (consistent with the analytic results presented in Sheth 
et al. 2001). Another possibility is inaccurate treatment of 
halo spatial exclusion effects; in redshift space halo-halo cor- 
relations are important down to smaller scales than in real 
space, because velocity dispersion suppresses halo profiles. 
We shall come back to this point in the next subsection when 
we study higher-order moments. 

The excellent agreement shown for the quadrupole to 
monopole ratio even at scales below zero-crossing is very 
encouraging, since until now there was no alternative to nu- 
merical simulations that could provide an accurate treat- 
ment of redshift distortions. The problem, as emphasized 
by Hatton & Cole (1998) is that phenomenological models, 
where linear perturbation theory predictions (Kaiser 1987) 
are convolved with a kernel which describes the effects of ve- 
locity dispersion (Peacock & Dodds 1994, Park et al. 1994), 
tend to underestimate non-linear effects at large scales. Sim- 
ilarly, recent models of redshift distortions based on the halo 
model (White 2001, Seljak 2001) which assume linear per- 
turbation theory for halo-halo correlations do not accurately 
reproduce the quadrupole moment, although they provide a 
good description of the power spectrum monopole (White 
2001). 

In our case the large-scale (k < 0.2 h/Mpc) behavior of 
the monopole and quadrupole is dictated by 2LPT and has 
little to do with velocity dispersion (unlike in phenomeno- 
logical models), and is reproduced very accurately. In fact, 
the bottom panel in Fig. g shows as dotted line the fitting 
formula found by Hatton & Cole (1999) empirically from an 
ensemble of n-body simulations. This relation, valid at scales 
larger than that of the quadrupole zero-crossing, fits our 
results extremely well. A detailed explanation of how per- 
turbation theory can accurately describe the large-scale be- 
havior of power spectrum multipoles will be presented else- 
where. Essentially, the large-scale pairwise velocity along the 
line of sight is strong enough to suppress the power spectrum 
monopole and drive the quadrupole to zero. The demonstra- 
tion of this effect requires an accurate treatment of the non- 
linear nature of the redshift-space mapping (Scoccimarro et 
al. 1999a), which in our numerical treatment is trivially im- 
plemented since 2LPT treats the dynamics perturbatively 
but the mapping to redshift space is done exactly. Note that 
the deviations from linear perturbation theory are not negli- 
gible, even at scales larger than the non-linear scale k w 0.2 
h/Mpc, in particular for the quadrupole. 

5.3 Higher-Order Moments in Real and Redshift 
Space 

Figure ^ shows a comparison of PTHalos against n-body 
simulations for higher-order moments of the density field in 
real (top) and redshift (bottom) space. The figure shows the 
skewness S3(R) and kurtosis Sa(R) as a function of smooth- 
ing scale R. Squares denote the average over 4 realizations 
of ACDM with 128 3 particles in a box of L box = 300 Mpc/h 
a side, obtained by running the Hydra code (Couchman et 



© 0000 RAS, MNRAS 000, 000,000 



PTHalos: A fast method for generating mock galaxy distributions 9 



al. 1995). The circles denote the average of 10 realizations 
of PTHalos with 300 3 particles in the same volume. 

The agreement in real space is very good, particularly 
for the skewness. The runs in this figure were made using 
R S rid = 20 Mpc/h, we found that for J? gr id = 15 Mpc/h, the 
skewness and kurtosis were underestimated at small scales 
R < 1 Mpc/h by 10% and 70%, respectively. Changing 
-Rgrid does not affect the power spectrum, but does affect 
increasingly more the higher-order moments which are most 
sensitive to the presence of high-mass halos. The choice of 
^grid = 15 Mpc/h is thus a bit small for Lb ox = 300 Mpc/h, 
which leads to an underestimation of the number of very 
massive halos (which occasionally share more than one cell 
15 Mpc/h a side). 

In redshift space (bottom panel in Fig. the situation 
is somewhat different, the small-scale skewness and kurtosis 
are actually higher in PTHalos than in n-body simulations. 
Increasing artificially the value of the velocity dispersion of 
halos, Eq. ( Jsi^ ) , does not fix the discrepancy (since the skew- 
ness and kurtosis are ratios of moments) , suggesting perhaps 
that the problem is due to our approximate treatment of 
halo exclusion. More work is needed to fully understand the 
origin of this discrepancy. 

5.4 Power Spectrum Covariance Matrix 

As mentioned in the introduction, one of the motivations 
behind PTHalos is to provide a efficient tool for calculating 
accurate error bars and covariance matrices for clustering 
statistics including the effects of non-linear evolution, red- 
shift distortions and galaxy biasing, that can be used to 
constrain cosmological parameters, galaxy formation mod- 
els and the statistics of primordial fluctuations from analysis 
of clustering in upcoming galaxy surveys. 

Here we consider what impact these effects have on the 
error bars and covariance matrix of the power spectrum. 
As it is well known, non-linear effects lead to increased er- 
ror bars in individual band power estimates and introduce 
correlations between them that are absent in the Gaussian 
case (Meiksin & White 1999; Scoccimarro, Zaldarriaga & 
Hui 1999b; Hamilton 2000). In this Section we compare the 
predictions of PTHalos for the covariance matrix of the dark 
matter power spectrum against the numerical simulation re- 
sults in Scoccimarro et al. (1999b), and present results that 
include redshift distortions and galaxy biasing as well. 

All the covariance matrices are calculated using 300 re- 
alizations of PTHalos. The SCDM (fi m = 1, <x 8 = 0.61) re- 
alizations contain 200 3 particles in a box of side Lbox = 100 
Mpc/h (identical volume and band power binning as the 
simulations we compare to, with bin width 8 k — 27r/100 
h/Mpc). The ACDM (fi m = 0.3, fl A = 0.7, cr 8 = 0.9) real- 
izations contain 300 3 particles in a box of side Lbox = 300 
Mpc/h. Finally, we construct realizations of galaxy distri- 
butions as described above with halo occupation numbers 
given by Eq. ( [22] ) , derived in the next section from compari- 
son to the PSCz survey. These have about 21.4 x 10 6 galaxies 
in a box of side Lbox = 300 Mpc/h (we do not include the 
survey selection function, our only purpose here is to explore 
the effects of galaxy weighing on the covariance matrix). In 
these cases, the width of bins in k-space is Sk = 0.05 h/Mpc 

Figure Q shows the results from PTHalos (solid lines) 
compared to the measurements obtained in Scoccimarro et 



al. (1999b) for the SCDM model. The top panel shows the 
ratio of the power spectrum errors (diagonal elements of 
the covariance matrix, Cu) to those under the assumption 
of Gaussianity, . PTHalos seems to overestimate the er- 
rors at small scales by perhaps as much as 50%, although 
we regard this as a preliminary result since our n-body re- 
sults contain only 20 realizations. Indeed, from our PTHalos 
Monte Carlo pool we see that error bars estimated from 
20 realizations typically have a scatter of about 40% com- 
pared to the results from 300 realizations. The remain- 
ing three panels in Fig. fj\ show the cross-correlation coef- 
ficient, tij = dj / \J CuCjj, between band powers centered 
at kj = 0.32,0.88,1.52 h/Mpc as a function ki. Here the 
agreement seems much better, although it seems PTHalos 
slightly overestimates cross-correlations. 

Figure ^ shows results in the same format as Fig. ^ 
but for ACDM PTHalos realizations for the dark matter 
power spectrum covariance matrix in real space (solid lines) , 
redshift space (dotted lines), and for galaxies [with halo 
occupation numbers given by Eq. (p2^] in redshift space 
(dashed lines). The effects of redshift distortions is to sup- 
press the non-Gaussian contribution to the errors and the 
cross-correlations between band powers (Meiksin & White 
1999; Scoccimarro et al. 1999b), due to the fact that velocity 
dispersion suppresses the higher-order moments, see Fig. pi 
For galaxies, there is the additional effect of weighing the 
contribution of dark matter halos by the galaxy occupation 
number. At the scales shown here, this weighing suppresses 
non- Gaussianity and thus errors and cross-correlations. We 
find however that for k > 3 h/Mpc (not shown), where the 
galaxy power spectrum is larger than the dark matter power 
spectrum, the situation reverses and galaxies have larger 
errors and cross-correlations than dark matter (as can be 
guessed e.g. from the asymptotic behavior in the top panel 
of Fig. §). 

Although galaxies in redshift space are generally less 
affected by non- Gaussianity, the effect is strong enough to 
be very important for constraining cosmological parameters 
from galaxy redshift surveys. Methods developed to decorre- 
late band powers (Hamilton 2000) were shown to work very 
well for the mass power spectrum when non-Gaussianities 
are modelled by the hierarchical model. It would be interest- 
ing to test these methods with PTHalos galaxy realizations 
to see how well they perform. 

Another interesting application of our code would be to 
weak gravitational lensing, where studies have been made of 
error bars and covariance matrices for the power spectrum 
and moments of the convergence field (White & Hu 1999, 
Cooray & Hu 2000, Van Waerbeke et al 2001). 



6 GALAXY DISTRIBUTIONS: APPLICATION 
TO THE PSCZ SURVEY 

As discussed above, one the motivations behind PTHalos is 
to apply it to galaxy distributions, where its accuracy for 
galaxy clustering is expected to be comparable to standard 
methods which are computationally much more costly. 

As a first example, we apply PTHalos to clustering in 
the PSCz survey (Saunders et al. 2000) assuming an un- 
derlying ACDM (Q m — 0.3, Qa = 0.7) cosmology. We use 
the following measurements of clustering statistics: the real- 
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Figure 7. The Power Spectrum Covariance Matrix in SCDM 
n-body simulations (symbols) and PTHalos (solid lines). The top 
panel shows the ratio of the power spectrum errors to those under 
the assumption of Gaussianity. The remaining three panels show 
the cross-correlation coefficient rij between band powers centered 
at kj = 0.32,0.88, 1.52 h/Mpc as a function ki (Gaussianity cor- 
responds to Tij = 8ij). 
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Figure 8. Same as Fig. [?], but for ACDM PTHalos realizations. 
Solid lines denote the real space dark matter power spectrum 
covariance matrix, whereas dotted lines correspond to rcdshift 
space. Dashed lines show redshift-space measurements of galaxies 
given by Eq. @. 



space power spectrum (Hamilton & Tegmark 2001) inverted 
from redshift-space, the skewness and kurtosis in redshift 
space as a function of scale (Szapudi et al. 2000) and the 
redshift-space bispectrum (Feldman et al. 2001). 

In order to specify a galaxy distribution, we need to 
specify the moments of the number of galaxies JV gal per halo 
of mass m. We use a binomial distribution, as described in 
Scoccimarro et al. (2001), which has only 2 free functions, 
the first and second moments of JV ga i(m). We relate these 
two by using the semianalytic model results of Kaufhnann 
et al. (1999), where the dispersion in N ga ,\(m) is sub-Poisson 
for dark matter halos of mass m < 10 13 Mq/1i, and Poisson 
otherwise. We then searched for a first moment parametrized 
by a broken power-law that leads to a good match with the 
real-space power spectrum of PSCz galaxies, and found 



(iV ga i(m)) =0.7 (m/m ) c 



(22) 



where a = for 8 x 10 lu M Q //i < m < mo, a = 0.7 for 
m > mo, mo = 4 x 10 11 Mq//i. By definition, 7V ga i = for 
masses below the mass resolution of the PTHalos realization, 
8x 10 10 Mq//i. The resulting power spectrum is shown in the 
top panel in Fig. |^ in solid lines, compared to the inferred 
power spectrum from the PSCz survey, shown in symbols 
with error bars (Hamilton & Tegmark 2001). Jing, Borner 
and Suto (2001) found recently that a similar relation, with 
a — 0.75, fits the two-point correlation function of PSCz 
galaxies. 

We then calculated the skewness, kurtosis, and bispec- 
trum in redshift-space (bottom panel in Fig. |^ and Fig. 1^) . 
The results turned out to be in very good agreement with the 
measurements in the PSCz survey (we only consider scales 
larger than R = 3 Mpc/h in Fig. M given that PTHalos is 
not yet accurate enough for higher-order moments in red- 
shift space, see Fig ^|). This is very encouraging, because 
we did not attempt to fit them by adjusting the relation in 
equation (p^). A similar result has been found by Szapudi 
et al. (2000) using the semianalytic models of Benson et al. 
(2000). 

The resulting bias parameters are also in good agree- 
ment with the full likelihood analysis of the PSCz bis- 
pectrum (Feldman et al. 2001), l/6i w 1.2 ± 0.2 and 
62/&1 ~ —0.4 ± 0.2. Indeed, from taking the ratio of the 
power spectrum of the PTHalos galaxies to the underlying 
dark matter spectrum at large scales, k < 0.3 h/Mpc, we 
find I/61 = 1.28. Comparing the large-scale skewness for 
the dark matter, 5*3 = 2.6, and for the PTHalos galaxies, 
S3 = 2, and using S' 3 w S 3 /bi + 36 2 /&i (Fry & Gaztanaga 
1993), yields b 2 /b\ = -0.44. 

We stress that the result in Eq. (^2|) should not be con- 
sidered a full analysis of constraints on galaxy occupation 
numbers from clustering of PSCz galaxies, since we have not 
explored the parameter space for JV ga i(m) in a detailed fash- 
ion and only considered the constraint from the power spec- 
trum measurements. However, the fact that this exercise led 
to very reasonable higher-order moments in redshift space 
gives us confidence that PTHalos can be successfully used to 
constrain galaxy formation models by using measurements 
of redshift-space clustering statistics at scales 7? > 3 Mpc/h. 
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Figure 9. Top panel: real-space power spectrum inferred from 
the PSCz survey (Hamilton & Tegmark 2001) com par ed to the 
predictions for galaxies obeying the relation in Eq.(G2j). Bottom 
panel: same for the skewness S3 and kurtosis 54 (Szapudi ct al 
2000). 



7 CONCLUSIONS 

We have described PTHalos, a fast algorithm which gener- 
ates point distributions which have similar clustering statis- 
tics to those of the observed galaxy distribution. We have 
shown that our approach, although based on simple approx- 
imations, is able to reproduce the results of numerical sim- 
ulations very well and requires minimal computational re- 
sources. Its speed and flexibility makes it suitable to ex- 
plore parameter space in constraining cosmological parame- 
ters and galaxy formation models that can be later studied 
in more detail if necessary by perhaps more accurate (and 
computationally costly) methods. 

As we noted in the introduction, although we only used 
PTHalos to model galaxy clustering, if one uses the PTHalos 
dark matter distribution in place of an n-body simulation, 
then, by placing a semianalytic galaxy formation package 
on top, one can model other properties of galaxies, such 
as luminosities and colours, as well. If one is interested in 
how bulge-to-disk ratios, or the correlation between galaxy 
luminosity and velocity dispersion, depend on local density, 
then the models described by, e.g., Dalcanton, Spergel & 
Summers (1997), Mo, Mao & White (1998), van den Bosch 
(2000), etc. can also be placed on top of the PTHalos dark 
matter distribution. These models may differ substantially 
in their formulation and in the specific physical processes 
which were considered relevant. The fact that PTHalos is 
several orders of magnitude faster than approaches which 
require the output of an n-body simulation means that it 
offers an efficient way of testing these different models. 

There is an interesting history of making mock catalogs 
of the galaxy distribution in which one does not start with 



Figure 10. The redshift-space bispcctrum in the weakly non- 
linear regime (k < 0.3 h/Mpc) as a function of triangle shape for 
galaxies populating halos as in Eq. (p2|) , compared to the PSCz 
bispectrum as measured by Feldman et al. (2001). The plot cor- 
responds to wavectors in a ratio of approximately k\/k2 = 0.5 
and angle between them equal to 8. Error bars here represent the 
scatter in each bin from measurements of triangles of the same 
shape but somewhat different overall scale, and should only be 
considered a rough estimate of the uncertainties, see Feldman et 
al. (2001) for a full analysis. 



an n-body simulation. Soneira & Peebles (1978) describe an 
algorithm which generates point distributions which have 
the same small-scale two-point statistics as the observations. 
However, their method does not allow them to generate ve- 
locities and, because they were designed to work on small 
scales only, they do not include large-scale correlations. Mo- 
tivated by the fact that the observed galaxy distribution is 
approximately Lognormal, Coles & Jones (1991) suggested 
that it might be a reasonable approximation to simply map 
initially Gaussian fluctuations to non-Gaussian ones by set- 
ting exp(5i n it) = 1 + tfenai- However, this mapping is only 
approximate, and it is not obvious how one might assign ve- 
locities, or account for the difference in clustering between 
the dark matter and galaxies. In addition, although it is pos- 
sible to generate one-point higher-order moments by a local 
transformation of a Gaussian field, this does not generate 
the correct configuration dependence of multi-point corre- 
lation functions, which in gravitational instability are the 
result of the non-local character of Poisson's equation. 

Sheth & Saslaw (1994) described a halo-based algo- 
rithm which is similar in spirit to the one presented here. 
However, their prescription had no large-scale correlations 
(their haloes were given a Poisson spatial distribution), and 
they did not describe how to incorporate velocities into 
their model, nor how to allow for differences between the 
dark matter and the galaxy distributions. Bond & Myers 
(1996) described a halo-based algorithm which did include 
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spatial correlations and used the Zel'dovich approximation 
for velocities. However, their algorithm was rather computa- 
tionally intensive. Moreover, although it produced approxi- 
mately the right number of massive halos, it was less accu- 
rate at the low mass end. 

Our PTHalos algorithm represents a real improvement 
because: 

• it has correct two- and three-point correlations on large- 
scales; 

• it has accurate higher-order moments on all scales; 

• it includes realistic velocity correlations; and 

• it allows one to model the differences between the 
distributions of dark matter and different galaxy types — 
nontrivial biasing effects are rather simple to incorporate. 

There are a number of inputs to PTHalos which are 
easily modified. For example, we distribute particles around 
halo centres so that they follow an NFW profile; this is eas- 
ily changed to one's favourite profile — a tophat, an isother- 
mal sphere, or one of the profiles described by Hernquist 
(1990) or Moore et al. (1999). Also, our code assumes that 
all haloes are smooth, whereas haloes in simulations have 
substructure. Neglecting this fact is reasonable on all but 
the very smallest scales because substructure accounts for 
only about fifteen percent of the mass of a halo (e.g. Ghigna 
et al. 2000). In any case, Blasi & Sheth (2000) provide simple 
fitting formulae to Ghigna et al.'s simulations which make 
including substructure straightforward. 

It is slightly more complicated to generate haloes with 
a range of ellipticities. There are, at present, no convenient 
parametrizations of the distribution of halo shapes in n-body 
simulations (see however Dubinski & Carlberg 1992) . The el- 
lipsoidal collapse model which reproduces the correct halo 
mass function makes specific predictions for this distribu- 
tion (Sheth, Mo & Tormen 2001); in principle, it could be 
used to specify the distribution of shapes. The simplification 
of spherical halos leads to discrepancies in the bispectrum, 
at small scales the reduced bispectrum Q looses the depen- 
dence on triangle shape at scales larger than those of the 
simulations (Scoccimarro et al. 2001). One the other hand, 
the numerical results in Ma & Fry (2000) suggest that except 
possibly for the smallest scales, the low-order spatial statis- 
tics at least are insensitive to whether or not the haloes are 
spherical. 

Improvements which we hope to include in the near 
future include the following. One of the places in which 
PTHalos is not as accurate as we would like is in its as- 
signment of virial velocities. At present, we assume that 
velocities within haloes are isotropic. Cole & Lacey (1996) 
show that this is not quite correct; haloes in simulations 
tend to have slightly more radial orbits near the edge of the 
halo. This is something we have included as a test but it 
turned out to be a small effect in the statistics we studied. 
However, in view of the deviations found for redshift-space 
higher-order moments, more work is needed to make sure 
our velocity statistics are accurate enough at small scales. 
In this connection, most importantly perhaps is that our 
present scheme which accounts for the fact that haloes may 
not overlap is somewhat adhoc. An improved treatment of 
exclusion, or a better algorithm for using the 2LPT density 
field to assign positions to the haloes output from the merger 
tree partition would be very useful. One promising option is 



to modify the approach described by Bond & Myers (1996), 
so that it can be applied to the 2LPT rather than the lin- 
ear fluctuation field, to assign positions to the most massive 
halos. One could then use the merger tree algorithm to as- 
sign the mass which remains to less massive haloes — one of 
the advantages of Sheth & Lemson's (1999b) merger tree 
algorithm is that it is easily adapted to construct partitions 
when it is known that some of the mass has already been 
assigned to more massive haloes. 

To model deep galaxy redshift surveys, the angular cor- 
relations of galaxies, weak lensing observations, quasar ab- 
sorption lines and the Ly-a forest, or the thermal and kine- 
matic Sunyaev-Zel'dovich effects, rather than providing par- 
ticle positions and velocities at a fixed epoch, it is more 
useful to have a simulation output particle positions and ve- 
locities along the observer's light-cone. In PTHalos this is 
particularly straightforward, and is the subject of ongoing 
work. 

Finally, another issue that is important for generating 
mock galaxy surveys is to adapt the simulation to the re- 
quired survey geometry and selection function. At present 
we generate a cubical volume that encloses the survey vol- 
ume and whose mean (constant) density matches the maxi- 
mum density required by the survey in question, as usually 
done in mock catalogues constructed from n-body simula- 
tions (see e.g. Cole et al 1998 for a detailed exposition). 
This requires a lot more galaxies than really needed. The 
efficiency of PTHalos will be improved significantly by in- 
cluding these considerations from the start, rather than im- 
posing them at the end. We hope to implement this in the 
next version of the code, which will become publically avail- 
able. 
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